02-19-24 (Friday)

Lord, we come to you asking for your wisdom to navigate the complex technological world you put us and your church.
We see the great powers that are laid today in the hands of big IT companies and institutions.
Give us the grace and strength to resist these powers when they threaten our lives and the lives of our neighbors
And, when we can’t, be with us as we suffer the brokenness of the culture we live in.
Just as our Lord Jesus suffered, give us the same hope that he had - that you have the power to make all things new.
Let this hope shine forth, so that more people may come and glorify you.

Lord, yours is the power, and you alone can do great things.
We are sure we shall see your goodness in the land of the living (Psalm 27:13-14).
Let us be strong, hold heart and wait for you, always. Amen.

1 Context managers and with syntax

  • Context managers are special types of classes/objects that create a kind of “context” around some operation, setting up some things first and then cleaning everything up when it is done.
    • To use a context manager, you use the Python syntax with.
  • For example, suppose you have a class called Transaction which refers to a bank transaction. Take a look at the following code:
class Transaction:
    def __enter__(self):
        print("Beginning transaction")
        # Do everything needed before doing the transaction - connect to databases, open files, etc.
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        # Do everything needed after finishing it - disconnect, close files, etc.
        print("Finished transaction")

# Using the context manager
with Transaction():
    print("Performing transaction operations")
Beginning transaction
Performing transaction operations
Finished transaction
  • Another example: a timer which is also a context manager:
import time

class Timer:
    def __enter__(self):
        self.start_time = time.time()
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.end_time = time.time()
        elapsed_time = self.end_time - self.start_time
        print(f"Time taken: {elapsed_time:.2f} seconds")

# Using the context manager
with Timer():
    # Do some time-consuming operation, for example:
    time.sleep(2)

1.1 File objects in Python ARE context managers

  • You can easily deal with opening and closing a file in Python using its context manager functions, which are part of the file object.
with open('example.txt', 'w') as f:  # returns a file object with "open", which will be referred to as "f"
    f.write('Hello, world!')

2 Handling Binary Files

  • If we don’t want to deal only with text files, Python has a way to read and write any file as a binary sequence.

  • You can check how any file is composed of binary numbers by opening in a hexadecimal viewer/editor such as in https://hexed.it/

2.1 Writing to a binary file

  • You will have to use the types byte and bytearray to deal with binary data. For example, the following code creates a binary file with some integers specified in the num list.
  • Notice that a file is opened as binary by adding the b character to the second argument in open:
f=open("binfile.bin","wb")
num=[5, 10, 15, 20, 25]
arr=bytearray(num)
f.write(arr)
f.close()

Some exploration activities: - Check how the type bytearray works (exploring the variable arr) - Open the file “binfile.bin” in a hexadecimal editor and try to see the binary numbers there

2.2 Reading from a binary file

  • You can read a binary file by adding the b character to the second argument in open:
f=open("binfile.bin","rb")

2.3 Example: reading data from a bmp file

  • A bitmap file is a standard format containing a 54 byte header specifying the size of the image and other parameters.
  • You can read the following code and try to figure out what the function read_bmp_pixels is doing.
    • Notice the function seek() that is used to position a kind of “cursor” at some point of the binary file
def read_bmp_pixels(file_path):
    with open(file_path, 'rb') as f:
        # Read header
        header = f.read(54)  # BMP header is 54 bytes
    
        # Extract width and height from the header
        width = int.from_bytes(header[18:22], byteorder='little')
        height = int.from_bytes(header[22:26], byteorder='little')
    
        # Calculate the size of pixel data
        pixel_data_size = width * height * 3  # For 24-bit BMP (3 bytes per pixel)
    
        # Move file pointer to the beginning of pixel data
        f.seek(54)
    
        # Read pixel data
        pixel_data = f.read(pixel_data_size)
    
    return pixel_data, width, height

# Example usage
file_path = 'example.bmp'
pixels, width, height = read_bmp_pixels(file_path)
print(f"Width: {width}, Height: {height}")
print(f"Total Pixels: {width * height}")
print(f"Pixel data size: {len(pixels)} bytes")
  • Later, try to open a bmp file from your computer and recognize the color pixels in this binary data. Every pixel is represented by three bytes, referring to the red, green and blue (RGB) components of the pixel.

3 Serialization

  • Serialization is the process of converting a data structure or object into a format that can be easily stored or transmitted and later reconstructed (“de-serialized”).

3.1 The pickle module

  • In Python, the pickle module provides a very simple way to serialize and deserialize objects.
  • See the following example:
import pickle

# Example object to serialize
data = {'name': 'Alice', 'age': 30, 'city': 'New York'}

# Serialize the object to a file
with open('data.pkl', 'wb') as f:
    pickle.dump(data, f)

# Deserialize the object from the file
with open('data.pkl', 'rb') as f:
    loaded_data = pickle.load(f)

print(loaded_data)

3.2 Other serialization formats: JSON and XML

  • JSON (JavaScript Object Notation) and XML (eXtensible Markup Language) are two other popular standard formats for serialization, similar to pickle but more widely used for interoperability and readability in various programming languages and applications. In other words: it is not restricted to Python.

  • JSON is generally more lightweight and easier to work with in web applications, while XML provides more flexibility and is often used in more complex data exchange scenarios.

3.2.1 JSON

  • A JSON file format would look like this, for example:
{
  "name": "Alice",
  "age": 30,
  "city": "New York"
}
  • JSON can be easily read with the language JavaScript, but also can be handled with Python using built-in module json (see documentation)

3.2.2 XML

  • A XML file would look like this:
<person>
  <name>Alice</name>
  <age>30</age>
  <city>New York</city>
</person>
  • Python can handle XML files using the built-in module xml.etree.ElementTree, or with third-party libraries such as xmltodict or lxml.

4 DataFrames

  • A DataFrame is essentially a two-dimensional, labeled data structure with columns of potentially different types, similar to a spreadsheet (Excel) or table.

  • It is a widely used structure in the data manipulation Python library called pandas.

  • You can read CSV, XLS and many other files and store them in a DataFrame object. For example:

# Read from CSV
df = pd.read_csv('data.csv')

# Read from XLS
df = pd.read_excel('data.xlsx')
  • You can check documentation for these functions here

4.1 Some operations

  • You can view the top or bottom rows of a DataFrame using head() and tail() methods.
print(df.head())  # View the first 5 rows
print(df.tail())  # View the last 5 rows
  • You can access columns and rows of a DataFrame using indexing and slicing.
print(df['Name'])  # Access a column
print(df.iloc[0])  # Access a row by index
  • You can perform basic operations like filtering, sorting, and adding columns.
# Filter rows
filtered_df = df[df['Age'] > 30]

# Sort by a column
sorted_df = df.sort_values(by='Age', ascending=False)

# Add a new column
df['IsAdult'] = df['Age'] > 18

4.2 Example: reading a CSV file with Pokémon Cards data and creating a GUI

The data file can be downloaded here.

import pandas as pd
from guizero import App,ListBox,Text

def set_text():
    t = cards_lb.value
    t += '\nType: '+ str(cards[cards['name']==cards_lb.value]['types'].values[0])
    t += '\nHP =: ' + str(cards[cards['name']==cards_lb.value]['hp'].values[0])
    text.value = t

cards = pd.read_csv("pokemon-tcg-data.csv")
cards = cards[cards['set']=='Jungle'][:100] # filtering only by one set

app = App(title="Pokémon Cards")
cards_lb = ListBox(app, height='fill', align='left', width=100,
                   items=list(cards['name']),
                   scrollbar = True,
                   command=set_text)
text = Text(app, height='fill', width='fill', align='right')

app.display()

5 Data Journeys

  • What are our files/data about?
  • Sabina Leonelli is a researcher who proposed the idea of looking through the journey that some data make through different people and applications.
  • Some questions to make:
    • How was this data collected, prepared and reported? What were the intentions, ideas, presuppositions?
    • How is it cleaned, ordered, transformed, or reused?
    • How is this data maintained, or, what are the infrastructures around it?
    • How is it disseminated and made accessible?
    • How is it visualized and analysed?
  • Again, all of these steps presuppose values and views of a good life.
  • What could we compromise if we enter in the middle of this process?
  • Can we discern virtue or vice in these steps?